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Abstract. - Recommender system is a very promising way to address the problem of overabun- 
dant information for online users. Though the information filtering for the online commercial 
systems received much attention recently, almost all of the previous works are dedicated to design 
new algorithms and consider the user-item bipartite networks as given and constant information. 
However, many problems for recommender systems such as the cold-start problem (i.e. low recom- 
mendation accuracy for the small degree items) are actually due to the limitation of the underlying 
user-item bipartite networks. In this letter, we propose a strategy to enhance the performance of 
the already existing recommendation algorithms by directly manipulating the user-item bipartite 
networks, namely adding some virtual connections to the networks. Numerical analyses on two 
benchmark data sets, MovieLens and Netflix, show that our method can remarkably improve the 
recommendation performance. Specifically, it not only improve the recommendations accuracy 
(especially for the small degree items), but also help the recommender systems generate more 
diverse and novel recommendations. 
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Introduction. — In the Internet era, the rapid growth 
of the World- Wide- Web leads to a serious problem of infor- 
mation overload: people are now facing too many choices 
to be able to find out those most relevant ones pQ. So far, 
the most promising way to efficiently filter the abundant 
information is to employ the personalized recommenda- 
tions 013]. That is to say, using the personal history 
record of a user to uncover his preference and to return 
each user with the most relevant items according to his 
taste [3]. For instances, youtube.com uses people's video 
viewing record to provide individual suggestions for their 
potential interested videos. 

There are already many recommendation algorithms for 
the online user-item commercial systems. Among these 
algorithms, the simplest one is the popularity-based rec- 
ommendations, which recommend the most popular items 
to users. However, such recommendations are not per- 
sonalized so that identical items are recommended to in- 
dividuals with far different tastes. By comparison, the 
collaborative filtering makes use of collective data from 
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individual preferences to provide personalized recommen- 
dations [HUB]. Recently, recommendation algorithms have 
been proposed from a physics perspective. For example, 
the process of mass diffusion (MD) was applied on the 
user-item bipartite networks to explore items of poten- 
tial interest for a user [7]- The mass diffusion algorithm 
outperforms the previous ones in the recommendation ac- 
curacy. However, such method is still biased to popular 
items even if individual preferences are considered. An 
alternative approach, based on the heat conduction (HC) 
on the user-item graphs, was thus introduced [5]. This 
algorithm provides users with many novel items and leads 
to diverse recommendations among users. However, HC 
has low accuracy compared with MD. This drawback is 
eventually solved by combining MD with HC in a hybrid 
approach, which can be well-tuned to obtain significant 
improvement in both recommendation accuracy and item 
diversity [S|. More Recently, the long term influence of 
such hybrid approach on network evolution has been stud- 
ied [TO]. 

However, all these methods are focusing on improve the 
recommendation all from the system point of view. The 
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recommendation on the items with little information are 
actually still a critical challenging [J] . For the fresh or un- 
popular items (also called niche items), it is very difficult 
to predict the potential users who are going to interest in 
them due to lacking of historical record. Such problem is 
always referred as cold-start problem and many researches 
have been dedicated to solve this problem. Related works 
concerning this issue are mainly based on modifying the 
existing methods by introducing some parameters |11H13] . 

Different from previous works, we tried to solve the 
cold-start problem through a very fundamental way in 
this letter. Instead of designing a new recommendation 
algorithm, we make use of the MD algorithm and solve 
the cold-start problem by directly manipulating the un- 
derlying user-item bipartite networks [HI [15]. Actually, 
the idea of the network manipulation has been applied to 
enhance many kinds of network functions such as synchro- 
nization [TOlfTT] . traffic dynamics [IS], percolation [P9"Il2"0"] . 
navigation |21] and so on. In our case, we first analyze 
the historical record of each item and accordingly add 
some virtual connections to the networks (especially for 
the small degree items) to provide the recommendation 
algorithm with more information. By using the MD al- 
gorithm, we find that the recommendation accuracy for 
the small degree items can be largely enhanced after ma- 
nipulating the networks. The further test on the overall 
recommendation metrics, our method are shown to help 
the MD algorithm outperform the hybrid approach of the 
MD and HC algorithms in both recommendation accuracy 
and diversity. 

Recommendation algorithms. Online commer- 
cial systems can be well described by the user-item bipar- 
tite networks. If a user collects an item, a link is drawn be- 
tween them. Specifically, we consider a system of N users 
and M items represented by a bipartite network with ad- 
jacency matrix A, where the element ai a = 1 if a user i has 
collected an item a, and a.; Q = otherwise (throughout 
this paper we use Greek and Latin letters, respectively, for 
item- and user-related indices). 

There are many recommendation algorithms. In this 
letter, we mainly consider the Mass Diffusion (MD), Heat 
Conduction (HC) and the corresponding hybrid algo- 
rithms of these two algorithms (Hybrid). We first briefly 
describe these algorithms. 

For a target user i, the MD algorithm [7] starts by as- 
signing one unit of resource to each item collected by i, 
and redistributes the resource through the user-item net- 
work. We denote the vector f as the initial resources on 
items, where the a-th component p a is the resource pos- 
sessed by item a. Recommendations for the user i are 
obtained by setting the elements in f to be f l a = a,i a , in 
accordance with the items the user has already collected. 

The redistribution is represented by f — Wf , where 



is the diffusion matrix, with kp — ~^2a =1 aip and kj = 
a jf denoting the degree of item j3 and user j respec- 
tively. The resulting recommendation list of uncollected 
items is then sorted according to p a in descending order. 
Physically, the diffusion is equivalent to a three-step ran- 
dom walk starting with ki units of resources on the target 
user i. The recommendation score of an item is taken to be 
its amount of gathered resources after the diffusion. This 
algorithm was shown to enjoy a high recommendation ac- 
curacy. 

The HC algorithm [SJ works similar to the MD algo- 
rithm, but instead follows a conductive process repre- 
sented by 
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Physically, the recommendation scores can be interpreted 
as the temperature of an item, which is the average tem- 
perature of its nearest neighborhood, i.e. its connected 
users. The higher the temperature of an item, the higher 
its recommendation score. By using this algorithm, the 
items with small degree can receive relatively high recom- 
mendation score and finally be promoted to appear in the 
top recommendation list. 

The hybrid algorithm of MD and HC was proposed 
in [5], with the new recommendation score h a given by 
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where the parameter A adjusts the relative weight between 
the two algorithms. When A increases from to 1, the 
hybrid algorithm changes gradually from HC to MD. Such 
hybrid approach was shown to achieve both accurate and 
diverse recommendation. 

Data. — In order to test the performance of the recom- 
mendation results, we use two benchmark data sets in this 
letter. The first one is the MovieLens data [55] which has 
1,682 movies (items) and 943 users. The other is Netflix 
data [53] consisting of 10, 000 users and 6, 000 movies. The 
data sets are random samplings of users activity records 
in these two online systems. In both data sets, users can 
vote movies by giving different rating levels from 1 to 5 (i.e. 
worst to best). Here, only the rating larger than 2 are con- 
sidered as a link. After this preliminary filtering, there are 
finally 82, 520 links in movielens data and 701, 947 links in 
the netflix data. Each data is then randomly divided into 
two parts: the training set (E T ) and the probe set (E p ). 
The training set contains 90% of the original data and the 
recommendation algorithm runs on it. The probe set has 
the remaining 10% of the data and will be used to test the 
performance of the recommendation results. 

The network manipulating method. The net- 
work manipulating (NM) method takes place after divid- 
ing the data to E p and E T . The main idea of NM is to 
add some virtual links to the training set E T , so that the 
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Fig. 1: (Color online) The overall ranking score R under dif- 
ferent 5 and Q in (a) Movielens and (b) Netflix, and the local 
ranking score Rk<5 under different S and Q in (c) Movielens 
and (d) Netlfix. 
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Fig. 2: (Color online) The overall ranking score R and the local 
ranking score i?fe<5 under different S when Q = 2 in (a), (c) 
Movielens and Q = f in (b) , (d) Netflix. The vertical dash line 
is the optimal <J we used. 



niche items will have more information to provide to the 
recommendation algorithm. Denoting Q as the fraction 
of links added, the total number of virtual links will be 
Q\E T \. For each item, the probability to receive virtual 
links is related to its degree, i.e. p a oc k~ s where 5 is a 
tunable parameter, when S > 0, the items with smaller 
degree tend to receive more links, and vice versa. Sup- 
posing an item a is selected to receive a link, the virtual 
link will connect to the user who enjoys the highest aver- 
age similarity to the already existing selectors of the item 
a. In this letter, the similarity is calculated by Saltern 
Index 24] as 
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where T(i) denotes the set of neighbors of user i. After 
adding virtual connections to the networks, we will em- 
ploy the MD algorithm to do the recommendation and 
this combination is denoted as "MD with NM". In the 
following discussion, we will compare this method to the 
original MD algorithm and the hybrid approach of MD 
and HC. 

Metrics for recommendation. An effective rec- 
ommendation should be able to accurately find the items 
that users like. In order to measure the recommendation 
accuracy, we make use of ranking score (R) . Specifically, R 
measures whether the ordering of the items in the recom- 
mendation list matches the users' real preference. As dis- 
cussed above, the recommender system will provide each 
user with a ranking list which contains all his uncollected 
items. For a target user i, we calculate the position for 
each of his link in the probe set. Supposing one of his un- 
collected item a is ranked at the 5th place and the total 
number of his uncollected items is 100, the ranking score 
Ri a will be 0.05. In a good recommendation, the items 



in the probe set should be ranked higher, so that R will 
be smaller. Therefore, the mean value of the R over all 
the user-item relations in the probe set can be used to 
evaluate the recommendation accuracy as 
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The smaller the value of R, the higher the recommendation 
accuracy. 

In reality, online systems only present users with only 
the top part of the recommendation list. Therefore, we 
further consider another more practical recommendation 
accuracy measurement called precision, which only takes 
into account each user's top-L items in the recommenda- 
tion list. For each user i, his precision of recommendation 
is calculated as 

di{L) 



PiiL) 



(6) 



where di{L) represents the number of user i's deleted links 
contained in the top-L places in the recommendation list. 
For the whole system, the precision P{L) can be obtained 
by averaging the individual precisions over all users with 
at least one link in the probe set. 

Predicting what a user likes from the list of best sell- 
ers is generally easy in recommendation, while uncovering 
users' very personalized preference (i.e. uncovering the 
unpopular items in the probe set) is much more difficult 
and important. Therefore, diversity should be considered 
as another significant aspects for recommender systems 
besides accuracy. In this letter, we employ two kinds of 
diversity measurement: interdiversity and novelty. 

The interdiversity mainly consider how users' recom- 
mendation lists arc different from each other. Here, we 
measure it by the Hamming distance. Denoting Cij(L) as 
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Table 1: The performance of different methods in Movielens and Netflix data. The recommendation list length is set as L — 20. 
In MD with NM method, the parameter 8 is chosen as —0.05. In the hybrid method, all the metrics are calculated under the 
optimal parameter as discussed in ref. [9] . The entries corresponding to the best performance over all methods are emphasized 
in black. 



Network 


Method 


R 


Rk<5 


P(20) 


#(20) 


N(20) 




Original MD 


0.0933 


0.7324 


0.1427 


0.7161 


303.8 


Movielens 


MD with NM 


0.0670 


0.4089 


0.2720 


0.8451 


254.8 




Hybrid 


0.0759 


0.5059 


0.1532 


0.8055 


276.7 




Original MD 


0.0457 


0.5618 


0.0886 


0.5443 


2803 


Netflix 


MD with NM 


0.0454 


0.4713 


0.0918 


0.5613 


2784 




Hybrid 


0.0450 


0.5174 


0.0885 


0.5469 


2806 
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Fig. 3: (Color online) Rk vs. item degree k when using different 
methods in (a) Movielens and (b) Netflix data sets. 

the number of common items in the top-L place of the rec- 
ommendation list of user i and j, their hamming distance 
can be calculated as 

Hij (L) = l-^P-. (7) 

Clearly, Hij(L) is between and 1, which are respectively 
corresponding to the cases where i and j having the same 
or entirely different recommendation lists. Again, aver- 
aging Hij(L) over all pairs of users, we obtain the mean 
hamming distance H(L). A more personalized recommen- 
dation results in a higher H(L). 

The novelty measures the the average degree of the 
items in the recommendation list. For those popular items, 
users may already get them from other channels. However, 
it's hard for the users to find the relevant but unpopular 
item. Therefore, a good recommender system should pre- 
fer to recommend small degree items. The metric novelty 
can be expressed as 

Ni(L) = i ]T k a (8) 

where O 1 represents the recommendation list for user i. A 
low mean popularity N(L) for the whole system indicates 
a high novel and unexpected recommendation of items. 

Results. — We will begin our analysis with the rec- 
ommendation accuracy since it is one of the most impor- 
tant aspect to evaluate the recommendation results. We 
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Fig. 4: (Color online) The precision under different 8 when 
Q = 2 in (a) Movielens and Q = 1 in (b) Netflix. The vertical 
dash line is the optimal 8 we used. 



first investigate the result of ranking score R under dif- 
ferent 5 and Q. Since the NM method partially aims 
at solving the cold-start problem, we also define a local 
ranking score which is the average ranking score of the 
items with degree not larger than 5 (denoted as Rk<b)- 
The results on Movielens and Netflix data are reported in 
Fig. 1. Clearly, with more links added to the network, 
Rk<5 becomes smaller. Consequently, the overall R is im- 
proved. Given the value of Q, a smaller 6 yields a lower 
Rk<5, which means the recommendation for the small de- 
gree items becomes more accurate. However, the overall R 
does not monotonously change with S. For each Q, there 
is a corresponding optimal 6 which yields the best R. 

For solving the cold-start problem, fig. 1(c) and (d) 
suggest that large Q generally works better. However, 
in order to keep a reasonable computational time of the 
recommendation algorithm, we select Q = 2 in movielens 
data and Q = 1 in Netflix data. In the following discus- 
sion, we will keep Q — 2 and Q = 1 in movielens and 
netflix data, respectively. The results of the ranking score 
are studied more detailedly in fig. 2. In this figure, beside 
the curve of the MD with NM, we plot the results of the 
original MD and Hybrid algorithms (without adding any 
virtual links) as a comparison. For overall R, negative 
<5 clearly works better. However, positive 6 is beneficial 
for improving the ranking score for small degree items. 
In our simulation, we find S near (8* = —0.05) per- 
forms best. As we can see from fig. 2, under this 8*, the 



p-4 



Improving information filtering via network manipulation 



Table 2: The performance of different methods when the recommendation list length varies in Movielens and Netflix data. In 
MD with NM method, the parameter S is chosen as —0.05. In the hybrid method, all the metrics are calculated under the 
optimal parameter as discussed in ref. The entries corresponding to the best performance over all methods are emphasized 
in black. 



Network 


Method 


P(50) 


P(100) 


#(50) 


#(100) 


N(50) 


AT(IOO) 




Original MD 


0.0948 


0.0646 


0.6395 


0.5418 


252.1 


215.6 


Movielens 


MD with NM 


0.1565 


0.0913 


0.8078 


0.7523 


203.5 


171.9 




Hybrid 


0.1031 


0.0707 


0.7699 


0.7299 


220.9 


178.2 




Original MD 


0.0594 


0.0419 


0.4222 


0.3496 


2337 


1876 


Netflix 


MD with NM 


0.0621 


0.0450 


0.4334 


0.3784 


2326 


1860 




Hybrid 


0.0596 


0.0420 


0.4243 


0.3522 


2335 


1874 



overall R can remarkably outperform not only the original 
MD algorithm but also the hybrid algorithm in movielens 
data. In Netflix data, 6* yields a similar R to the original 
MD and hybrid algorithms. However, the MD with MN 
method outperforms the other two algorithms in Rk<5 in 
both data sets. For the value of each metric, see table I. 

To show how the ranking score varies on items with 
different value of degrees, we additionally investigate an 
item-degree-dependent ranking score Rk [25]. Rk is de- 
fined as the average ranking score over items with the same 
value of degrees. In fig. 3, the relation between Rk and 
the item degree k is displayed respectively for the Movie- 
lens and Netflix at the optimal parameters 5* = —0.05. 
Besides the MD with NM method, we also plot the re- 
sults of original MD and hybrid method for comparison. 
Obviously, the ranking score of small degree items can be 
significantly improved by adding the virtual connections. 
Moreover, the ranking score of large degree items can be 
effectively preserved. 

As discuss above, another way to estimate the accuracy 
of the recommendation results is the precision. Here, we 
select the recommendation list L = 20, and report the pre- 
cision of MD with NM, original MD and hybrid methods 
in fig. 4. In ref. [5], it is shown that the hybrid approach 
can improve the precision compared to the original MD 
method, so that the green line is higher than the red in 
fig. 4. In the MD with NM method, a negative 5 gener- 
ally works better for precision in both data sets, which is 
consistent with the case of the overall R. Furthermore, we 
observe that the MD with NM method can largely out- 
perform the other two algorithms in precision under the 
optimal S*. 

In additional to accuracy, the recommendation diversity 
is of great significance. For interdiversity, we can estimate 
how the recommendation results are different from user 
to user. A larger hamming distance indicates a more per- 
sonalized recommendation. Besides, the novelty is also an 
important aspect. With a small novelty, the average de- 
gree of the recommended items are low, so that more fresh 
items will appear in the recommendation list. Setting the 
recommendation list length as L — 20, the related results 
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Fig. 5: (Color online) The hamming distance and novelty under 
different 8 when Q = 2 in (a), (c) Movielens and Q = 1 in (b), 
(d) Netflix. The recommendation list length is set as L = 20. 
The vertical dash line is the optimal S we used. 

of different methods are reported in fig. 5 and the detailed 
value can be seen in table I. 

In fig. 5, we immediately notice that the MD with NM 
method yields a bigger hamming distance and smaller 
novelty index compared to the original MD and hybrid 
method. These results indicate that our method provide 
more diverse recommendation results for users. When 
adding virtual links, the parameter S actually controls the 
preference of the virtual links to items with different de- 
gree. With a positive 5, more virtual links are added to 
small degree items and small degree items will be pro- 
moted to higher places in users' recommendation list. Ac- 
cordingly, hamming distance (novelty) in principle should 
increase (decrease) with S as shown in Netflix data in Fig. 
5. However, we remark that the effect of S on these two 
metrics are not always monotonous since these two metrics 
are focusing only on a small number of top ranked items in 
the recommendation list. In movielens data, we can even 
observe that a small positive S yields a small hamming 
distance and a large novelty. However, if we increase 5 to 
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a certain larger positive value in our simulation, the ham- 
ming distance will increase and the novelty will decrease 
in the movielens data. 

In reality, the recommendation list length L varies in dif- 
ferent online commercial systems. Therefore, we further 
investigate the cases where L = 50 and L = 100 in the 
metrics including precision, hamming distance and nov- 
elty. The results are reported in table II. Consistent with 
the above case where L = 20, the MD with NM method 
outperforms the original MD and hybrid method, which 
suggests that the improvement from the NM method is 
very robust. 

Conclusion. — Information abundant is a serious 
problem nowadays for online users. In order to filter irrele- 
vant information, many recommendation algorithms have 
been proposed. In this field, one of the biggest challenges 
is the cold-start problem, i.e. the new items have too lit- 
tle historical record to be correctly recommended. So far, 
all the methods dedicated to solve this problem focused 
on modifying the existing methods by introducing some 
parameters. In this letter, we try to solve the problem by 
directly adding some virtual connection to the bipartite 
networks so that the niche items have enough information 
for the recommendation algorithms. Interestingly, besides 
improving the recommendation accuracy (especially for 
small degree items), our method can enhance the recom- 
mendation diversity compared to the well-known hybrid 
method of mass diffusion and heat conduction algorithms. 

In practice, it is actually not necessary to add too many 
virtual links to the networks if we only want to enhance 
the accuracy for those not so popular items and preserve 
more or less the overall recommendation accuracy. Gener- 
ally, adding 10% links will be sufficient to solve the cold- 
start problem (see fig. 1). Therefore, our method can 
be easily applied to real online commercial systems with- 
out increasing too much the computational complexity of 
the recommendation process. Finally, whether the current 
NM method is the optimal one for each recommendation 
algorithm is still unknown. For instance, some special al- 
gorithms such as the heat conduction algorithm, which 
mainly recommends niche items, might require for a dif- 
ferent virtual link adding strategy. Related problems ask 
for further investigation in the future. 
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